Analysis of popular attractions in Glasgow based on Tripadvisor data

2519347w

04.04.2020

1.Introduction

Tourism is one of Scotland's most important industries, providing Scotland with 5% of GDP and 8.5% of the employed population (Scottish Government, 2020). In 2017, Glasgow City Council took the lead in launching Glasgow's Tourism and Visitor Plan. The plan aims to bring 771 million pounds of economic growth to Glasgow and create 6,600 new jobs. (Glasgow City Council, 2017)

As early as 2008, studies have found that online comments have a significant influence on tourists' choices. (Gretzel and Yoo, 2008) Therefore, studying the content of online comments of essential attractions in Glasgow is very useful to Glasgow's tourism industry.

TripAdvisor is currently the world's largest Internet travel review platform. As of 2020, the total number of users' comments on TripAdvisor has reached 884 million. According to its statistics, 72% of website visitors often refer to relevant comments when deciding where to travel, accommodation, and restaurants. (TripAdvisor, 2019) Therefore, the images of Glasgow attractions on TripAdvisor will affect many potential tourists' decisions. The massive comments left by tourists can also help us understand the crucial features of Glasgow attractions. These will be the basis for policy maker to formulate policies to promote the development of Glasgow tourism.

In summary, this research attempts to answer the following three questions:

  1. What are the Glasgow Top100 attractions on TripAdvisor? What are their locations, popularity, and ratings?
  2. TripAdvisor has put forward the concept of Top15 in each city because most tourists from other cities and countries will only choose their destination from the Top15. So, what are the Top15 attractions of Glasgow? What kind of tourists do these attractions attract? What comments did the tourists leave? How have these comments changed over time?
  3. Due to the openness and profitability requirements of TripAdvisor, the ranking and comment data of attractions are inevitably be mixed with varying degrees of noise. Therefore, this research also tries to explore which attractions have too much noise in their comments that makes them rank higher than they really are.

2.Data

2.1 Data collection

The data used in this research is divided into two parts. The first part is the basic information of Glasgow Top100 attractions. The second part is details of Glasgow Top15 attractions' visitors and their comments. Because TripAdvisor strictly controls its API, these two parts are collected through data scraping.

The first part is collected by parsing the website code through the 'BeautifulSoup' package and 'Nominatim' service in python. For brevity, the code of scraping is stored in the attachment. For details, see Appendix1 Top100data.ipynb or Appendix1 Top100data.html. This part includes the name, rating, comment volume, cover image and geographic coordinates of Glasgow top100 attractions.

Because TripAdvisor has a dynamic loading mechanism. All long comments need to click to load, and its anti-crawler mechanism will also block IP with high request frequency. Therefore, this part of the data is scraped using 'Webharvy' to imitate manual clicks. Fifteen XML files containing scraping rules are placed in the appendix, open Appendix2 Webhary XML.rar for details.

2.2 Data cleaning

The data of Top100 was cleaned when it was scrapped, and it is stored as a 'top100.csv' file.

The raw data of Top15 captured by 'Webharvy' has 15 separate CSV files. For the sake of brevity, the process is also stored as a separate file. Appendix3 Top15data.ipynb and Appendix3 Top15data.html shows the cleaning process of the raw data of one attraction, and the other 14 raw data have the same processing flow. Finally, these 15 clean data are merged into the top15.csv file.

In some rare cases, the 'date of experience' is missing. Because the comments in TripAdvisor are sorted by time except for the homepage, and the time is only accurate to the month. Therefore, the 'date' is very close to or equal to the adjacent previous or next value. So, in this research, filled the missing 'date' with adjacent values from front to back.

3. Top100 attractions data analysis

3.1 cover image of Top100

The 100 small images in the following figure show the cover image of Glasgow's top 100 destinations on TripAdvisor. For those who do not know Glasgow, the cover image's quality will profoundly affect their first impression. Some of them show clear and attractive images. In comparison, some will make people feel confusing. For example, in the second picture, the Riverside Museum's main building does not appear. Instead, it appears as the background in the 19th small image.

The relevant attractions staff should contact the staff of TripAdvisor and provide high-quality photographic pictures to provide potential visitors with a clear and good first impression.

3.2 The distribution of Top100 attractions

It can be seen from the map that most of Glasgow Top 100 attractions are in the city centre and west end areas. The two areas are closely connected geographically. Therefore, the arrangement of tourist signs and public transportation facilities can encourage visitors to visit more attractions in a day and stimulate more consumption. Drag and zoom the map to see the specific location and gathering situation of the Top100. Click on the mark to get the name of the attraction.

3.3 The distribution of Top100 Comment Volume

It can be seen from the table that the minimum value of top100 comments is 16, the maximum value is 15,299, the median is 287, the mean is 817, and the std is 1893. There is a big difference between the attractions.

To facilitate visual expression, the upper limit of the scale in the map is 3000. In fact, Kelvingrove Art Museum, The Riverside Museum, Buchannan Street have more than 3000 comments.

Although the city centre has more top100 attractions, the attractions in the west end area have the largest number of comments, which means that the went-end area attracts more tourists. Click the mark to get the name and specific comment volume.

3.4 Distribution of Top100 ratings

The ratings of Top100 are similar. The average value of ratings is 4.29, and the std is 0.49. Among them, the score of 'Elfingrove' is only 1.5. It is a regular event held every year in the square in front of the Kelvingrove Art Museum. There are very few events in the top 100 that have no fixed space. Obviously, people feel bad about the experience of this event. Most attractions are rated 4.5. Because the rating is in units of 0.5, the actual rating between 4.3 and 4.8 are all defined as 4.5. It is b. Click the mark to get the name and specific rating.

4. Top15 attractions data analysis

4.1 Top15 attractions basic information

As can be seen from the table below, many of the top15 attractions have many matching characteristics. In brief, there are four museums, three wineries, two sports fields, two gardens/parks, one university, one cemetery, one religious site and one pedestrian street. It can be seen from the map that top15 is mainly distributed in west end and city centre areas.

From Figure 1 and Figure 2, the ranking of each attraction is not based on the total number of comments or the average rating, and TripAdvisor has not given a specific ranking formula.

The number of comments shown in the figure will be different from the number of comments shown at the top100. Because there are 3%-7% of the comments are written in other languages, which are not included in this analysis. All top15 scores are in the range of 4.5±0.3. The average rating of the Clydeside distillery is the highest, and the average rating of Glasgow science is the lowest.

4.2 contributions and helpful votes

In TripAdvisor, the avatar of each reviewer will show how many comments it has posted in TripAdvisor. This value is displayed as the value of 'contribution'. It also shows how many other viewers' 'helpful votes' these reviewers have received in total. It can show visitors' characteristics, such as whether they are keen to post comments and the degree of recognition of their comments by other TripAdvisor visitors.

People with high contributions and high helpful votes are referred to as 'enthusiasts' in this study. Their comments will have a more significant impact on viewers than other comments in practice. However, overall, because TripAdvisor's comment display mechanism is mainly time series unless special attention is given in the settings, other viewers still lack the motivation to click to turn pages constantly.

Figures 3, 4, and 5 show that the number of helpful votes and contributions has a significant positive linear relationship. The most popular attraction among 'enthusiasts' are Glasgow Cathedral and the Tenement House. Moreover, some anomalies, the median contribution and vote of Clydeside Distillery, Tennents Wellpark Brewery and Celtic Park are shallow. This point will be analysed in summary.

Figure 6 show the relationship between 'average rating' and 'help vote' of each attraction. Overall, the ratings of 'enthusiasts' are stricter than others. However, the situation of each attraction in the TOP15 is different. Those straight lines with a smaller slope mean that the attraction may be more 'worthy of its name' or meet the expectations of 'enthusiasts'. Furthermore, those straight lines with a greater degree of inclination may mean that the attraction's actual quality may be lower than the expectations of the "enthusiasts". The line with the most apparent decline in the TOP15 represents "Celtic Park". The smoothest line represents The Riverside Museum and The Tenement House.

4.3 travel type

At the end of each comment, some reviewers choose to show the type they travel. There are six situations, travelled solo, travelled with family, travelled on business, travelled with friends, travelled as a couple or not given. The statistics of the travel style can represent the main target groups of different attractions. The staff of each attraction can also optimise the attraction according to the characteristics of these people.

From the following six figures, we can see that Buchanan Street and The Necropolis are the most popular with solo visitors. The most popular among families is Glasgow Science Centre, which is close to 50%. It is worth noting that the actual visitors are mainly children, but the actual evaluators are mostly their parents. Therefore, the evaluation of the attraction does not reflect the experience of the children. The most popular among business visitors is the University of Glasgow. However, it is worth noting that this type of travel has the lowest proportion of all travel types. The most popular with friend visitors are Tennents Wellpark brewery and Glengoyne Distillery, both well-known wineries. The most popular with couples is The Tenement House. It is worth noting that the Clydeside Distillery and Ibox Stadium comment information is abnormal. The proportion of Not given of them is close to 70%.

4.4 Time series analysis

4.4.1 comment volume change by time

From figure13, the number of comments of TOP15 attractions has grown rapidly from 2010 to 2016, which is related to the surge in users of TripAdvisor and the development of Glasgow's tourism. However, the number of reviews has declined since 2017. This may be related to TripAdvisor's market share, review rules, and the Glasgow tourism itself. The real reasons for the decline need further analysis in the future. The COVID epidemic in 2020 has had a severe impact on Glasgow's tourism, and the number of comments has dropped by 80% compared to previous years.

From figure14 and figure15, the hot season for tourism in Glasgow is summer and autumn, with the largest number of tourists from July to August each year. These months are also when the weather is at its best.

From figure16 and figure17, unlike the overall situation, the number of comments on Ibrox Stadium and The Tenement House is increasing year by year until 2020. Celtic Park's comments surged in 2019, which was related to the critical football matches it hosted. The Clydeside Distillery only has reviewed since 2017, so it is not easy to make comparisons. Since Celtic Park and Ibrox Stadium hold sports events, the distribution of hot spots is different from other attractions.

4.4.2 average rating change by time

The change of the average rating over time can show the change in the degree of popularity of visitors' attractions. The rise and fall of the score are worthy of further in-depth analysis of the specific reasons.

From figure19 and figure20, the average rating of Top15 attractions has gradually increased over time. However, from figure21, the specific situation is different for each attraction. The average rating increased significantly over time is The Riverside Museum, Celtic Park and Ibrox Stadium. The average score decreased significantly over time is Tennents Wellpark Brewery and Buchanan Street.

6. Noise in the ranking

Due to the platform's openness and the need for profitability, there will inevitably be untrue situations in the comments and rankings. Therefore, it is necessary to identify which popular attractions are ranked too high. From the above analyses, we can find that many data of 08 The Clydeside Distillery have specific problems:

  1. Compared to other attractions, the total number of comments is low, and the first comment appeared in 2017.
  2. The low median number of contribution and vote represents that the vast majority of commenters are newcomers or newly registered accounts
  3. The abnormally high proportion of Not Given of travel type.

Based on these, we can infer that the ranking of 08 The Clydeside Distillery is falsely high, and there may be fake reviews or commercial promotion. However, TripAdvisor officials did not give relevant annotations.

4.5 Word cloud analysis

Glasgow Top15 attractions have 44,032 reviews. The visitor's comments contain much helpful information. Doing word cloud analysis on these comments can let us know what the most mentioned words by visitors. These words help to understand the essential characteristic of each attraction in the hearts of visitors. This research uses 'textblob' for word cloud analysis.

This research first analysed the content and title of all TOP15 reviews and then made a word cloud image for each attraction's content and title. Also, for space considerations, only one analysis process is shown here. The complete code is in Appendix4 Wordcloud.ipynb and Appendix4 Wordcloud.html.

As can be seen from the figure, among all the TOP15 reviews, the most mentioned word is 'museum'. The number of comments from the Kelvingrove Art Museum and The Riverside Museum accounted for 52% of the total number of comments.

Among other non-degree words,' street', 'building', 'shop', 'old', and 'history' also have higher frequencies. All these can represent the impressive features of Glasgow attractions. In the analysis of the titles of all TOP15 comments, we can also see such as "family" and "transport". Since high scores account for a large proportion, these words have a positive correlation with emotions.

Top200 words in comment of Kelvingrove Museum (above)

Top200 words in comment title of Kelvingrove Museum (above)

Top200 words in comment of The Riverside Museum (above)

Top200 words in comment title of The Riverside Museum (above)

Top200 words in comment of Glengoyne Distillery (above)

Top200 words in comment title of Glengoyne Distillery (above)

Top200 words in comment of Celtic Park (above)

Top200 words in comment title of Celtic Park (above)

Top200 words in comment of of University of Glasgow (above)

Top200 words in comment title of of University of Glasgow (above)

Top200 words in comment of The Necropolis(above)

Top200 words in comment title of The Necropolis(above)

Top200 words in comment of Tennents Wellpark Brewery(above)

Top200 words in comment title of Tennents Wellpark Brewery(above)

Top200 words in comment of The Clydeside Distillery(above)

Top200 words in comment title of The Clydeside Distillery(above)

Top200 words in comment of Ibrox Stadium(above)

Top200 words in comment title of Ibrox Stadium(above)

Top200 words in comment of Glasgow Botanic Gardens(above)

Top200 words in comment title of Glasgow Botanic Gardens(above)

Top200 words in comment of Glasgow Science Centre(above)

Top200 words in comment title of Glasgow Science Centre(above)

Top200 words in comment of Glasgow Cathedral(above)

Top200 words in comment title of Glasgow Cathedral(above)

Top200 words in comment of Buchanan Street(above)

Top200 words in comment title of Buchanan Street(above)

Top200 words in comment of Pollok Country Park(above)

Top200 words in comment title of Pollok Country Park(above)

Top200 words in comment of The Tenement House(above)

Top200 words in comment title of The Tenement House(above)

In the word cloud figure of each attraction, there are some words worth paying attention to. For example, the Tenement House review title's most frequent word is 'back', which is scarce in other attractions. After querying the original comment, we can find that the attraction is a museum showing Glasgow housing history. Many visitors feel going back to the past when they visit the museum, so the frequency of back appearing is very high.

5. Conclusion

In this research, two valuable data sets were established, which contained basic information of Glasgow's Top 100 attractions on TripAdvisor, and all English-language comments of Top 15 attractions and relevant information about reviewers. These two data sets have great value for follow-up research.

This research conducted various analyses and visualizations on these two data sets.The basic information of Tpo100 attractions is displayed through the map. This research conducted a detailed analysis of the top 15 attractions. Including the relationship between the average rating and whether the reviewer is 'enthusiasts'. The proportion of different travel type in each attraction. The changes in the comment volume and an average rating of each attraction over time. Moreover, found the attraction with inflated rankings. Finally, this research conducted a word cloud analysis of all the TOP15 comments to show visitors' most mentioned words.

There are some limitations of this research. First, on TripAdvisor, there are a small number of non-English comments. Since these comments contain dozens of languages, this analysis did not include them. However, studying these comments can be very effective in understanding the experience of non-English speaking visitors. Glasgow's Tourism and Visitor Plan also aims to increase attractions to these groups of people.

Second, some visitors also uploaded photos they took when commenting. These photos have high analytical value. However, preliminary analysis of photos of some attractions found that the types of photos taken by visitors are greatly affected by its types. For example, museums' photos are mainly exhibits, while the photos of public spaces like parks and streets. It is more abundant. This leads to the difficulty of horizontal comparison. Simultaneously, the current computer graphics are mainly focusing on semantic segmentation, target detection and intelligent description of images. The analysis of the photographer's emotions and specific key objects, for example, the appearance frequency of a specific painting. This also led this research to abandon this part of the analysis at the end.

Third, when drawing the word cloud for each attraction, the segmentation of the words one by one leads to the semantics' fragmentation. Therefore, many descriptive words appeared in the word cloud figure. If the phrase can be analysed, the researcher can get more helpful information.

In future research, it is worth trying to build a regression model of keywords/phrases and their related scores. This can further understand which words/phrases have a significant relationship with lower scores, which will also become a necessary basis for increasing attractiveness.

Word count: 2931

References:

COUNCIL, G. C. 2017. Glasgow’s Tourism and Visitor Plan to 2023. https://glasgowtourismandvisitorplan.com/tourism-and-visitor-plan/.

GOVERNMENT, S. 2020. Tourism and events [Online]. Available: https://www.gov.scot/policies/tourism-and-events/ [Accessed 01 April 2021].

GRETZEL, U. & YOO, K. H. 2008. Use and impact of online travel reviews. Information and communication technologies in tourism 2008, 35-46.

TRIPADVISOR. 2019. Online Reviews Remain a Trusted Source of Information When Booking Trips, Reveals New Research [Online]. Tripadvisor. Available: https://tripadvisor.mediaroom.com/2019-07-16-Online-Reviews-Remain-a-Trusted-Source-of-Information-When-Booking-Trips-Reveals-New-Research [Accessed 01 April 2021].